Are Human-Input Seeds Good Enough for Entity Set Expansion? Seeds Rewriting by Leveraging Wikipedia Semantic Knowledge

نویسندگان

  • Zhenyu Qi
  • Kang Liu
  • Jun Zhao
چکیده

Entity Set Expansion is an important task for open information extraction, which refers to expanding a given partial seed set to a more complete set that belongs to the same semantic class. Many previous researches have proved that the quality of seeds can influence expansion performance a lot since human-input seeds may be ambiguous, sparse etc. In this paper, we propose a novel method which can generate new, high-quality seeds and replace original, poor-quality ones. In our method, we leverage Wikipedia as a semantic knowledge to measure semantic relatedness and ambiguity of each seed. Moreover, to avoid the sparseness of the seed, we use web resources to measure its population. Then new seeds are generated to replace original, poor-quality seeds. Experimental results show that new seed sets generated by our method can improve entity expansion performance by up to average 9.1% over original seed sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Choosing Better Seeds for Entity Set Expansion by Leveraging Wikipedia Semantic Knowledge

Entity Set Expansion, which refers to expanding a human-input seed set to a more complete set which belongs to the same semantic category, is an important task for open information extraction. Because human-input seeds may be ambiguous, sparse etc., the quality of seeds has a great influence on expansion performance, which has been proved by many previous researches. To improve seeds quality, t...

متن کامل

Where Are You Settling Down: Geo-locating Twitter Users Based on Tweets and Social Networks

Time Description of activity 8:30-18:00 Conference Registration 9:10-9:30 Conference opening 9:30-10:30 Keynote Speaker1:Norbert Fuhr, University of Duisburg-Essen 10:30-11:00 Coffee break Session 1: Evaluation and user studies 11:00-11:30 The Reusability of a Diversified Search Test Collection 11:30-12:00 One Click One Revisited: Enhancing Evaluation based on Information Units 12:00-12:30 A Co...

متن کامل

Iterative Set Expansion of Named Entities using the Web

Set expansion refers to expanding a given partial set of “seed” objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a language independent fashion. In a previous study, SEAL showed good set expansion performance using three seed entities; however, when given a la...

متن کامل

Harvesting Domain-Specific Terms using Wikipedia

We present a simple but effective method of automatically extracting domain-specific terms using Wikipedia as training data (i.e. self-supervised learning). Our first goal is to show, using human judgments, that Wikipedia categories are domainspecific and thus can replace manually annotated terms. Second, we show that identifying such terms using harvested Wikipedia categories and entities as s...

متن کامل

Wikidata: A Platform for Data Integration and Dissemination for the Life Sciences and Beyond

Wikidata is an open, Semantic Web-compatible database that anyone can edit. This ‘data commons’ provides structured data for Wikipedia articles and other applications. Every article on Wikipedia has a hyperlink to an editable item in this database. This unique connection to the world’s largest community of volunteer knowledge editors could help make Wikidata a key hub within the greater Semanti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012